AITopics | training over-parameterized deep neural network

Collaborating Authors

training over-parameterized deep neural network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

An Improved Analysis of Training Over-parameterized Deep Neural Networks

Neural Information Processing SystemsDec-25-2025, 12:46:08 GMT

A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for over-parameterized (i.e., sufficiently wide) deep neural networks. However, the condition on the width of the neural network to ensure the global convergence is very stringent, which is often a high-degree polynomial in the training sample size $n$ (e.g., $O(n^{24})$). In this paper, we provide an improved analysis of the global convergence of (stochastic) gradient descent for training deep neural networks, which only requires a milder over-parameterization condition than previous work in terms of the training sample size and other problem-dependent parameters. The main technical contributions of our analysis include (a) a tighter gradient lower bound that leads to a faster convergence of the algorithm, and (b) a sharper characterization of the trajectory length of the algorithm. By specializing our result to two-layer (i.e., one-hidden-layer) neural networks, it also provides a milder over-parameterization condition than the best-known result in prior work.

improved analysis, name change, training over-parameterized deep neural network, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.60)

Add feedback

Reviews: An Improved Analysis of Training Over-parameterized Deep Neural Networks

Neural Information Processing SystemsJan-24-2025, 11:38:46 GMT

While this paper makes a nice contribution to an important problem, I am not sure if it is significant enough for the conference. The overall outline of the analysis follows closely that of [2], and the main new component is the improved gradient lower bound, which is largely based on previous ones in [2] and [16]. Although the improved analysis provides new insight and I find it useful, I do not feel that it will provide a big impact. The other technical contribution on improved trajectory length is also nice but again I feel that it is somewhat incremental. The results seem technically sound; the proofs all look reasonable although I did not verify them thoroughly.

contribution, improved analysis, training over-parameterized deep neural network

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Reviews: An Improved Analysis of Training Over-parameterized Deep Neural Networks

Neural Information Processing SystemsJan-24-2025, 11:38:35 GMT

This paper analyzes the convergence of GD and SGD for overparametrized networks, which result in improvements of the overparametrization requirement. Initially the paper received weakly positive reviews. Specifically, they felt that while the contribution follows closely prior work [2,16], the paper still makes a nice contribution, which is insightful and novel. The rebuttal addressed the issues raised by the reviewers. While one reviewer remained concerned as to whether the ideas in the paper can be extended further, upon discussion, the reviewers agreed that the paper should be accepted.

improved analysis, reviewer, training over-parameterized deep neural network

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

An Improved Analysis of Training Over-parameterized Deep Neural Networks

Neural Information Processing SystemsOct-10-2024, 05:34:20 GMT

A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for over-parameterized (i.e., sufficiently wide) deep neural networks. However, the condition on the width of the neural network to ensure the global convergence is very stringent, which is often a high-degree polynomial in the training sample size n (e.g., O(n {24})). In this paper, we provide an improved analysis of the global convergence of (stochastic) gradient descent for training deep neural networks, which only requires a milder over-parameterization condition than previous work in terms of the training sample size and other problem-dependent parameters. The main technical contributions of our analysis include (a) a tighter gradient lower bound that leads to a faster convergence of the algorithm, and (b) a sharper characterization of the trajectory length of the algorithm. By specializing our result to two-layer (i.e., one-hidden-layer) neural networks, it also provides a milder over-parameterization condition than the best-known result in prior work.

improved analysis, milder over-parameterization condition, training over-parameterized deep neural network, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.64)

Add feedback

An Improved Analysis of Training Over-parameterized Deep Neural Networks

Zou, Difan, Gu, Quanquan

Neural Information Processing SystemsMar-18-2020, 21:15:59 GMT

A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for over-parameterized (i.e., sufficiently wide) deep neural networks. However, the condition on the width of the neural network to ensure the global convergence is very stringent, which is often a high-degree polynomial in the training sample size $n$ (e.g., $O(n {24})$). In this paper, we provide an improved analysis of the global convergence of (stochastic) gradient descent for training deep neural networks, which only requires a milder over-parameterization condition than previous work in terms of the training sample size and other problem-dependent parameters. The main technical contributions of our analysis include (a) a tighter gradient lower bound that leads to a faster convergence of the algorithm, and (b) a sharper characterization of the trajectory length of the algorithm. By specializing our result to two-layer (i.e., one-hidden-layer) neural networks, it also provides a milder over-parameterization condition than the best-known result in prior work. Papers published at the Neural Information Processing Systems Conference.

artificial intelligence, machine learning, training over-parameterized deep neural network, (7 more...)

Neural Information Processing Systems

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.63)

Add feedback